src: clamp idle_diff before computing proc_diff #372

santigimeno · 2025-09-29T15:56:43Z

Avoid unsigned underflow when loop_diff is slightly larger than idle_diff due to clock skew on idle loops.

Fixes: https://github.com/nodesource/nsolid-private/issues/15

Summary by CodeRabbit

Bug Fixes
- Clamped event-loop and idle time calculations to prevent unsigned underflow.
- Corrected idle-vs-loop comparisons to avoid negative diffs producing bogus utilization.
- Reduced spurious spikes in reported event-loop usage for more reliable monitoring.

coderabbitai · 2025-09-29T15:56:51Z

Walkthrough

Adds defensive clamping in event-loop metric calculations: clamp loop_idle_time to loop_duration in uv_metrics_cb_, and clamp idle_diff to loop_diff in get_event_loop_stats_ to prevent unsigned underflow and incorrect utilization values.

Changes

Cohort / File(s)	Summary of changes
Event-loop metric clamping `src/nsolid/nsolid_api.cc`	- In `uv_metrics_cb_`: if `loop_idle_time > loop_duration`, set `loop_idle_time = loop_duration` to prevent underflow in later proc time calculations. - In `get_event_loop_stats_`: if `idle_diff > loop_diff`, set `idle_diff = loop_diff` to avoid unsigned underflow and bogus utilization values.

Sequence Diagram(s)

sequenceDiagram
    participant UV as uv_metrics_cb_
    participant Stats as get_event_loop_stats_
    participant State as loop vars/times

    UV->>State: read loop_idle_time, loop_duration
    alt loop_idle_time > loop_duration
        UV->>State: loop_idle_time = loop_duration
        Note right of State: clamp to prevent unsigned underflow
    end

    Stats->>State: read loop_diff, idle_diff
    alt idle_diff > loop_diff
        Stats->>State: idle_diff = loop_diff
        Note right of State: clamp to avoid bogus utilization
    end
    Stats->>State: compute proc_diff and utilization metrics

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

juanarbol

Poem

I’m a rabbit in C, quick to mend a small blip,
I clipped idle tails so metrics won't slip.
No underflow tunnels, no numbers gone bad,
A tiny safe hop keeps the stats calm and glad. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "src: clamp idle_diff before computing proc_diff" is fully related to the main changes in the changeset. The raw summary indicates that clamping logic was added in two locations (uv_metrics_cb_ and get_event_loop_stats_) to prevent idle_diff from exceeding loop_diff, which directly aligns with the title's description. The title is concise, specific, and clearly communicates the primary technical solution being implemented to address unsigned underflow issues in loop timing calculations. A developer scanning the commit history would quickly understand that this change involves adding clamping logic for idle_diff calculations.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch santi/clamp_loop_delay

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 890194b and a4e371e.

📒 Files selected for processing (1)

src/nsolid/nsolid_api.cc (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/nsolid/nsolid_api.cc

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

juanarbol

SGTM

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 13eb35c and 5666509.

📒 Files selected for processing (1)

src/nsolid/nsolid_api.cc (2 hunks)

🔇 Additional comments (1)

src/nsolid/nsolid_api.cc (1)

2165-2171: LGTM! Clamping prevents unsigned underflow.

The clamping logic correctly handles the clock skew issue where idle_diff can be slightly larger than loop_diff on some platforms. This prevents unsigned underflow in the proc_diff calculation on line 2171 and eliminates the bogus utilization values reported in issue #15.

coderabbitai · 2025-10-16T17:03:32Z

src/nsolid/nsolid_api.cc

+  // Check current_hrtime_ >= exit_time to avoid unsigned underflow when
+  // exit_time is slightly larger due to clock skew (observed on Windows).
+  // Without this check, the subtraction can produce bogus values in the
+  // millions of seconds range.
+  fprintf(stderr, "[DEBUG] exit_time=%llu, prev_call_time=%llu, current_hrtime=%llu, rolling_est_lag=%f\n",
+          (unsigned long long)exit_time,
+          (unsigned long long)stor->prev_call_time_,
+          (unsigned long long)stor->current_hrtime_,
+          envinst->rolling_est_lag_);
  stor->loop_estimated_lag =
-    (exit_time > 0 && stor->prev_call_time_ > exit_time) ?
+    (exit_time > 0 && stor->prev_call_time_ > exit_time &&
+     stor->current_hrtime_ >= exit_time) ?
    (stor->current_hrtime_ - exit_time) / 1e6 : envinst->rolling_est_lag_;
+  fprintf(stderr, "[DEBUG] loop_estimated_lag=%f (condition: exit_time>0=%d, prev>exit=%d, current>=exit=%d)\n",
+          stor->loop_estimated_lag,
+          exit_time > 0,
+          stor->prev_call_time_ > exit_time,
+          stor->current_hrtime_ >= exit_time);


⚠️ Potential issue | 🔴 Critical

Remove debug logging before merging.

The additional guard stor->current_hrtime_ >= exit_time correctly prevents unsigned underflow, but the debug fprintf statements on lines 2206-2210 and 2215-2219 must be removed before merging to production:

These debug logs will pollute stderr in production environments

Performance concern: metrics collection runs frequently (every event loop iteration)

Portability: %llu may not be correct for uint64_t on all platforms (consider PRIu64 from <cinttypes>)

Apply this diff to remove the debug logging:

- fprintf(stderr, "[DEBUG] exit_time=%llu, prev_call_time=%llu, current_hrtime=%llu, rolling_est_lag=%f\n", - (unsigned long long)exit_time, - (unsigned long long)stor->prev_call_time_, - (unsigned long long)stor->current_hrtime_, - envinst->rolling_est_lag_); stor->loop_estimated_lag = (exit_time > 0 && stor->prev_call_time_ > exit_time && stor->current_hrtime_ >= exit_time) ? (stor->current_hrtime_ - exit_time) / 1e6 : envinst->rolling_est_lag_; - fprintf(stderr, "[DEBUG] loop_estimated_lag=%f (condition: exit_time>0=%d, prev>exit=%d, current>=exit=%d)\n", - stor->loop_estimated_lag, - exit_time > 0, - stor->prev_call_time_ > exit_time, - stor->current_hrtime_ >= exit_time); stor->loop_idle_percent = (1 - stor->loop_utilization) * 100;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Check current_hrtime_ >= exit_time to avoid unsigned underflow when

// exit_time is slightly larger due to clock skew (observed on Windows).

// Without this check, the subtraction can produce bogus values in the

// millions of seconds range.

fprintf(stderr, "[DEBUG] exit_time=%llu, prev_call_time=%llu, current_hrtime=%llu, rolling_est_lag=%f\n",

(unsigned long long)exit_time,

(unsigned long long)stor->prev_call_time_,

(unsigned long long)stor->current_hrtime_,

envinst->rolling_est_lag_);

stor->loop_estimated_lag =

(exit_time > 0 && stor->prev_call_time_ > exit_time) ?

(exit_time > 0 && stor->prev_call_time_ > exit_time &&

stor->current_hrtime_ >= exit_time) ?

(stor->current_hrtime_ - exit_time) / 1e6 : envinst->rolling_est_lag_;

fprintf(stderr, "[DEBUG] loop_estimated_lag=%f (condition: exit_time>0=%d, prev>exit=%d, current>=exit=%d)\n",

stor->loop_estimated_lag,

exit_time > 0,

stor->prev_call_time_ > exit_time,

stor->current_hrtime_ >= exit_time);

// Check current_hrtime_ >= exit_time to avoid unsigned underflow when

// exit_time is slightly larger due to clock skew (observed on Windows).

// Without this check, the subtraction can produce bogus values in the

// millions of seconds range.

stor->loop_estimated_lag =

(exit_time > 0 && stor->prev_call_time_ > exit_time &&

stor->current_hrtime_ >= exit_time) ?

(stor->current_hrtime_ - exit_time) / 1e6 : envinst->rolling_est_lag_;

stor->loop_idle_percent = (1 - stor->loop_utilization) * 100;

🤖 Prompt for AI Agents

In src/nsolid/nsolid_api.cc around lines 2202 to 2219, remove the two debug fprintf blocks (the one printing exit_time/prev_call_time/current_hrtime/rolling_est_lag and the one printing loop_estimated_lag and the condition flags) so they don't pollute stderr or impact performance; keep the added guard (stor->current_hrtime_ >= exit_time) and the loop_estimated_lag computation unchanged, and if any future logging is needed use a proper logging macro or PRIu64 via <cinttypes> rather than raw fprintf with %llu.

Not anymore in the PR code.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if the debug fprintf statements are still present in the current PR branch grep -n "fprintf(stderr, \"\[DEBUG\]" src/nsolid/nsolid_api.cc || echo "No debug fprintf statements found"

Length of output: 33

@santigimeno Confirmed! I've verified that the debug fprintf statements have been removed from the current code. Thank you for addressing this concern. The core fix with the stor->current_hrtime_ >= exit_time guard to prevent unsigned underflow looks good. 👍

🧬 Code Graph Analysis Results

src/nsolid/nsolid_api.cc: UpdateConfig (EnvList::UpdateConfig)

Summary: Merges a new JSON config into current_config_, detects diffs, and if there are changes:

increments current_config_version_

toggles promise tracking if "promiseTracking" is present

updates the continuous profiler if contCpuProfile/contCpuProfileInterval changed

(re)starts OTLP, StatsD, and gRPC agents if "otlp", "statsd", or "grpc" keys are present

updates info_.tags when "tags" changes

enqueues the new config to on_config_string_q_ and signals process_callbacks_msg_

Parameters:

const nlohmann::json& config: new configuration to merge

Returns: void (performs updates and queues callbacks as needed)

Exceptions/Errors: uses CHECK/ASSERT patterns; on error, behavior follows existing CHECK_EQ paths (likely termination in debug builds)

src/nsolid/nsolid_api.cc: calculateKernelVersion helper and GetKernelVersion

Summary: Linux-specific code to derive a numeric kernel version for telemetry/config purposes.

calculateKernelVersion(): parses kernel version from /proc/version_signature on Ubuntu, or from uname() / release, handling Debian quirks and UNAME26 edge cases. Returns a version as major<<16 | minor<<8 | patch.

GetKernelVersion(): caches the computed kernel_version (uint32_t) and returns it; Linux-specific path is guarded with #ifdef linux.

Parameters:

None (internal helpers use system information)

Returns:

uint32_t kernel_version (cached) on success; 0 if not Linux or unavailable

Exceptions/Errors:

Reads files and uses sscanf/strstr; on errors it returns 0 or 0-equivalent version as fallback.

Note: The above entries point to code that interacts with the PR-related changes in src/nsolid/nsolid_api.cc, specifically around dynamic reconfiguration (UpdateConfig) and host/kernel environment probing (GetKernelVersion/calculateKernelVersion). If you want, I can extract the exact code blocks for each item into a compact snippet with precise line references.

Clamp idle time values before subtracting from loop duration to avoid unsigned underflow when clock skew causes idle time to be slightly larger than total loop time. This manifests on Windows when the loop is nearly 100% idle. Applied in two locations: - uv_metrics_cb_: prevents bogus rolling_est_lag_ values - get_event_loop_stats_: prevents bogus loop_utilization values Fixes: nodesource/nsolid-private#15

Clamp idle time values before subtracting from loop duration to avoid unsigned underflow when clock skew causes idle time to be slightly larger than total loop time. This manifests on Windows when the loop is nearly 100% idle. Applied in two locations: - uv_metrics_cb_: prevents bogus rolling_est_lag_ values - get_event_loop_stats_: prevents bogus loop_utilization values Fixes: nodesource/nsolid-private#15 PR-URL: #372 Reviewed-By: Juan José Arboleda <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]>

santigimeno · 2025-10-20T10:09:01Z

Landed in cd09ca4.

Clamp idle time values before subtracting from loop duration to avoid unsigned underflow when clock skew causes idle time to be slightly larger than total loop time. This manifests on Windows when the loop is nearly 100% idle. Applied in two locations: - uv_metrics_cb_: prevents bogus rolling_est_lag_ values - get_event_loop_stats_: prevents bogus loop_utilization values Fixes: nodesource/nsolid-private#15 PR-URL: #372 Reviewed-By: Juan José Arboleda <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]>

santigimeno requested review from RafaelGSS and juanarbol September 29, 2025 15:56

santigimeno self-assigned this Sep 29, 2025

juanarbol approved these changes Sep 29, 2025

View reviewed changes

coderabbitai bot approved these changes Sep 29, 2025

View reviewed changes

RafaelGSS approved these changes Sep 29, 2025

View reviewed changes

santigimeno requested a review from EHortua October 7, 2025 15:38

coderabbitai bot requested changes Oct 16, 2025

View reviewed changes

santigimeno force-pushed the santi/clamp_loop_delay branch from 5666509 to 890194b Compare October 16, 2025 20:46

santigimeno force-pushed the santi/clamp_loop_delay branch from 890194b to a4e371e Compare October 16, 2025 21:28

EHortua approved these changes Oct 16, 2025

View reviewed changes

santigimeno closed this Oct 20, 2025

santigimeno deleted the santi/clamp_loop_delay branch October 20, 2025 10:09

EHortua added the QA approved label Oct 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

src: clamp idle_diff before computing proc_diff #372

src: clamp idle_diff before computing proc_diff #372

Uh oh!

santigimeno commented Sep 29, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 29, 2025 •

edited

Loading

Review ran into problems

Uh oh!

juanarbol left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 16, 2025 •

edited

Loading

Uh oh!

santigimeno Oct 20, 2025

Uh oh!

coderabbitai bot Oct 20, 2025

Uh oh!

santigimeno commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

src: clamp idle_diff before computing proc_diff #372

src: clamp idle_diff before computing proc_diff #372

Uh oh!

Conversation

santigimeno commented Sep 29, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Review ran into problems

Uh oh!

juanarbol left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

santigimeno Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

santigimeno commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

santigimeno commented Sep 29, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 29, 2025 •

edited

Loading

coderabbitai bot Oct 16, 2025 •

edited

Loading